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Abstract. Here we prove an asymptotically optimal lower bound on the information 
complexity of the fc-party disjointness function with the unique intersection promise, an 
important special case of the well known disjointness problem, and the ANDfc-function in 
the number in the hand model. Our fl{n/k) bound for disjointness improves on an earlier 
r2(n/(fc log fc)) bound by Chakrabarti et al. (2003), who obtained an asymptotically tight 
lower bound for one-way protocols, but failed to do so for the general case. Our result 
eliminates both the gap between the upper and the lower bound for unrestricted protocols 
and the gap between the lower bounds for one-way protocols and unrestricted protocols. 



1. Introduction 

Primarily, communication complexity, introduced by Yao [10], deals with the amount 
of communication that is needed in distributed computation, but apart from distributed 
computation, nowadays communication complexity has found applications in virtually all 
fields of complexity theory. The book by Kushilevitz and Nisan [9] gives a comprehensive 
introduction to communication complexity and its applications. 

Suppose that k players, each of them knowing exactly one argument of a function 
f{xi, . . . ,Xk) with k arguments, want to evaluate the function for the input that is dis- 
tributed among them. Clearly, to succeed at this task the players need to communicate. 
Here we consider the case that the players communicate by writing to a blackboard that 
is shared by all players. The rules that determine who writes which message to the black- 
board are usually called a protocol. The protocol terminates if the value of the function 
can be inferred from the contents of the blackboard, the so-called transcript of the protocol. 
Then the communication complexity of the function is the minimum number of bits that 
the players need to write to the blackboard in the worst case to jointly compute the result. 
This setting is usually called the number in the hand model since each part of the input is 
exclusively known to a single player who figuratively hides the input in his hand. In the 
randomized version of this model each player has access to a private source of unbiased 
independent random bits and his actions may depend on his input and his random bits. 
For a randomized e-error protocol the output of the protocol may be different from the value 



Key words and phrases: computational complexity, communication complexity. 




SYMPOSIUM 
ON THEORETICAL 
ASPECTS 
-r / OF COMPUTER 
SCIENCE 



© A. Gronemeier 

@ Creative Commons Attribution-NoDerivs License 



506 



A. GRONEMEIER 



of the function / with probabihty at most £. The e-error randomized commiuiication com- 
plexity of a function is defined in the obvious way. A formal definition of fc-party protocols 
can be found in [9]. Note that there are also other models of multi-party communication, 
but these models are not the topic of this paper. 

In recent publications [5, 2, 3, 4] lower bounds on the communication complexity of 
functions have been obtained by using information theoretical methods. In this context 
communication complexity is supplemented by an information theoretical counterpart, the 
information complexity of a function. Roughly, the information complexity of a function / is 
the minimal amount of information that the transcript of a protocol for / must reveal about 
the input. Besides being a lower bound for the communication complexity, information 
complexity has additional nice properties with respect to so-called direct sum problems. 

1.1. Our Result 

In this paper we will prove an asymptotically optimal lower bound on the communi- 
cation complexity of the multi-party set disjointness problem with the unique intersection 
promise. 

Definition 1.1. In the fc-party set disjointness problem each of the players is given the 
characteristic vector of a subset of an n-element set. It is promised that the subsets are 
either pairwise disjoint or that there is a single element that is contained in all subsets and 
that the subsets are disjoint otherwise. The players have to distinguish these two cases, the 
output of a protocol for set disjointness should be in the first case and 1 in the second 
case. If the promise is broken, then the players may give an arbitrary answer. 

Here we will prove the following result about the randomized communication complexity 
of the multi-party set disjointness problem in the number in the hand model. 

Theorem 1.2. For every sufficiently small constant e > the randomized e-error com- 
munication complexity of the k-party set disjointness problem with the unique intersection 
promise is bounded from below by Q{n/k). 

By the upper bound shown in [4] this result is asymptotically optimal with respect 
to the mmiber of players k and the size of the inputs n. An important application of 
this problem is the proof of a lower bound for the memory requirements of certain data 
stream algorithms [1]. Our improvement of the lower bound for disjointness does not have 
a significant impact on this application. But we think that the disjointness problem is 
interesting and important on its own since it is a well-known basic problem in communication 
complexity theory [1, 3, 4, 9]. Up to now the best known lower bound was r2(n/(fe log fc)) 
by Chakrabarti, Khot, and Sun [4], who also proved an asymptotically optimal lower bound 
for one-way protocols. This result left a gap both between the upper and the lower bound 
and between the lower bounds for one-way protocols and unrestricted protocols. Our result 
closes these gaps. 

Like the earlier results, our lower bound is based on an information theoretical approach. 
The main ingredient of this approach is a lower bound on the information complexity of 
the ANDfc-function, the Boolean conjunction of k bits. Since Theorem 1.2 will be a simple 
corollary of this result, and more importantly, since AND/j is a basic building block of 
any computation, the lower bound on the information complexity of ANDfe is the main 
result of this paper. We postpone the precise statement of this result to Theorem 3.2 in 
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Section 3 because some preparing definitions are needed beforehand. But we stress fiere 
that OUT rcsuh also closes the gap between the upper and lower bound on the conditional 
information complexity of AND^ for unrestricted protocols and the gap between the lower 
bounds on the information complexity of AND^ for one-way protocols and unrestricted 
protocols that was left open in [4]. 

1.2. Related Work 

The general disjointness problem without the unique intersection promise has a long 
history in communication complexity theory. Here we focus only on recent results for the 
multi-party set disjointness problem with the unique intersection promise, and especially 
on lower bounds that rely on information complexity arguments. For older results we refer 
the reader to the book by Kushilevitz and Nisan [9] and the references therein. 

Alon, Matias, and Szegedy [1] proved an n{n/k'^) lower bound for multi-party set 
disjointness and applied this bound to prove lower bounds for the memory requirements of 
data stream algorithms. Bar-Yossef, Jayram, Kumar, and Sivakumar [3] improved this to a 
lower bound of Q{n/k'^). They introduced the direct sum approach on which later results, 
including our result, are based and proved that the information complexity of AND^ is 
bounded from below by Q{l/k'^). Chakrabarti, Khot, and Sun [4] improved the lower 
bound for the information complexity of AND/; to log /c)) and thereby improved 

the lower bound for multi-party set disjointness to n{n/{klogk)). They also proved an 
asymptotically optimal lower bound for one-way protocols, a restricted model in which the 
players communicate in a predetermined order. Our result improves on these results, but 
furthermore we think that our proof technique is a useful contribution to the framework for 
which Bar-Yossef et al. [3] coined the term "information statistics" . Bar-Yossef et al. use this 
term for the combination of information theory and other statistical metrics on probability 
spaces. We use the direct sum approach from [3], but instead of the Hellinger distance that 
is used in [3] we use the Kullback Leibler distance. Since the Kullback Leibler distance 
is closely related to mutual information, we do not loose precision in the transition from 
information theory to statistical distance measures. By this, we are able to prove sharper 
bounds. Like Chakrabarti et al. [4], we take a closer look at the analytical properties of the 
functions that arc involved. Our improvements on this result are also due to the fact that 
our Kullback Leibler distance based arguments are very close to the information theory 
domain. 

2. Preliminaries 
2.1. Notation 

We use lower case letters for constants and variables and upper case letters for random 
variables. If the random variables X and Y have the same distribution, we briefly write 
X r^Y. For vector-valued variables we use a boldface font. For example, X = {Xi, ... , X^) 
is a random vector whose components are the random variables Xi for i = 1, . . . , k. In this 
case let X_i = {Xi, . . . , Xj+i, . . . ,Xk) denote the vector X without the fth compo- 

nent. A boldface zero and boldface one 1 denote the all-zero vector and all-one vector of 
appropriate size, respectively. Thus X_j = says that Xj = for all j E {I, . . . ,k} — {i}. 
For sums like J2^=o ^® sometimes do not explicitly specify the bounds of summation and 
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just write Oi. In this case the sum is taken over the set of aU values of i for which aj is 
meaningful. This set must be derived from context. For example, the sum /(Pr{X = t;}) 
should be taken over all values v in the range of X. All logarithms, denoted by log, are 
with respect to base 2. 



2.2. Information Theory 

Here we can merely define our notation for the basic quantities from information theory 
and cite some results that are needed in this paper. For a proper introduction to informa- 
tion theory we refer the reader to the book by Cover and Thomas [6] . In the following let 
/i2 denote the binary entropy function h2{p) = —plogp — (1 — p) log (1 — p) for p £ [0, 1]. 
Let X, Y, and Z be random variables and let E be an event, for example the event Y = y. 
Then H(X) denotes the entropy of the random variable X and li(X\E) denotes the entropy 
of X with respect to the conditional distribution of X given that the event E occurred. 
If there are several events separated by commas, then we analogously use the conditional 
distribution of X given that all of the events occurred. Let H(X|y) denote the conditional 
entropy of X given Y. Recall that H(X|y) = Pr{^ = v} H(X|y = y). If we condition 
on several variables, we separate the variables by commas. If we mix events and vari- 
ables in the condition, we first list the variables, after that we list the events, for example 
H(X|y, Z = z). The mutual information of X and Y is 1{X : Y) = R{X) - B.{X\Y) and 
I{X : Y\E) = R{X\E) - B.{X\Y,E) is the mutual information of X and Y with respect to 
the conditional distribution of X and Y given that the event E occurred. The conditional 
mutual information of X and Y given Z is 1{X:Y\Z) = H(X|Z) - E.{X\Y, Z). Recah that 
1{X:Y\Z) = Y.^Vi{Z = z}l{X:Y\Z = z). 

Suppose that the random variables X and Y have the same range. Then the Kull- 
back Leibler distance of their distributions is T>{X,Y) = Pr{X = v} log p|^|y^| . If 
Y'i{X = v} = in the above sum, then the corresponding term is independently of the 
value of Pr{y = ?;}, by continuity arguments. If Vx^X = v} ^ and Pr{y = ?;} = for some 
V, then the whole sum is defined to be equal to cxd. If £^ is an event, then {X\E) denotes the 
conditional distribution of X given that the event E occurred, for example D((X|£'), X) is 
the Kullback Leibler distance of the conditional distribution of X given that the event E 
occurred and the distribution of X. Recall that the mutual information of X and Y is the 
Kullback Leibler distance of the joint distribution {X, Y) and the product distribution of 
the marginal distributions: 

The following lemma is a useful tool for the proof of lower bounds on the Kullback Leibler 
distance of distributions. A proof of the log sum inequality can be found in [6]. 

Lemma 2.1 (Log sum inequality). For nonnegative numbers ai and hi, where i = 1, . . . ,n, 

Suppose that the random variables X and Y have the same finite range R. Then the 
total variation distance of their distributions is Y{X,Y) = ^ J2v I Pr{X = v} — Pt{Y = v}\. 
It is a well-known fact (see e.g. [7]) that V(X,F) = maxscR \ Pr{X e S} - Ft{Y e S}\. 
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The following lemma by Kullback relates the Kullback Leibler distance of distributions to 
their total variation distance. 

Lemma 2.2 (Kullback [8]). Suppose that X and Y are random variables that have the same 
finite range. Then D(X, Y)>2- V(X, Vf. 

2.3. Information Complexity 

The notion of the information cost of a protocol was introduced by Chakrabarti, Shi, 
Wirth, and Yao [5] . The information cost of a randomized protocol is the mutual information 
of the input and the transcript of the protocol. Then the information complexity of a 
function can be defined in the canonical way. Here we will use the conditional information 
complexity of a function, a refinement that was introduced by Bar-Yossef, Jayram, Kumar, 
and Sivakumar [3]. 

Definition 2.3. Let 5 be a set, let /: B'^ — > {0, 1} be a function, and let X e B'' and 

D be random variables. Suppose that P is a randomized fc-party protocol for / and that 
M(X) is the transcript of P for the input X. Then the conditional information cost of P 
with respect to X and D is defined by 

icost(P;X|L>) = I(M(X):X|D) . 

The conditional e-error information complexity ICe(/;X|Z)) of / w.r.t. X and D is the 
minimal conditional information cost of a communication protocol for /(X) where the 
minimum is taken over all randomized £-error protocols for /. 

The information complexity of a fimction is a lower bound for the communication 
complexity. A proof of the next theorem can be found in [3]. 

Theorem 2.4. Let B be a set, let /: B'' — *■ {0, 1} be a function, and let X G i?^ and D be 
random variables. Then the e-error communication complexity of f is bounded from below 
by IC,(/;X|L»). 

2.4. The Direct Sum Paradigm 

Information complexity has very nice properties with respect to direct sum problems. 

In this section we summarize the approach of Bar-Yossef, Jayram, Kumar and Sivakumar [3] 
using a slightly different terminology. We call a problem / a direct sum problem if it can 
be decomposed into simpler problems of smaller size. 

Definition 2.5. Let /: {B'^)'' — > {0, 1} be a function and let Xj = . . . G B" 

ioT i = 1,. . . ,k. If there are functions g: {0, 1}" — ^ {0, 1} and h: B^ — > {0, 1} such that 

/(Xi, . . . ,Xjk) = g {h{xi^i,X2,l, . . . ,Xk,l) , ... , h{xi^n,X2,n,---,Xk,n)) 

then the function / is called a g-h-direci sum. 

Here the goal is to express a lower bound on the conditional information complexity 
of / in terms of the conditional information complexity of the simpler function h and the 
parameter n. In order for this approach to work, the joint distribution of the inputs of h 
and the condition must have certain properties. As a first requirement, the condition must 
partition the distribution of the inputs into product distributions. 



510 



A. GRONEMEIER 



Definition 2.6. Let i? be a set and let X = (Xi, . . . , Xk) G and D be random variables. 
The variable D partitions X, if for every d in the support of D the eonditional distribution 
(X|Z) = (i) is the product distribution of the distributions {Xi\D = d) for i = 1, . . . ,k. 

The function / can be decomposed into instances of the function h if the distribution 
of the inputs of / satisfies our second requirement. 

Definition 2.7. Let 5 be a set, let g: {0, 1}" — > {0, 1} and h: B'' — > {0, 1} be functions, 
and let X G B^ be a random variable. If for every i G {1, . . . , n}, for every a G -B'^, and for 
every x = (xi, . . . , x„) G (B^)"' such that Xj G support(X) for all j 

g {h{xi), . . . , h{xi-i), h{a), h{xi+i), h{xn)) = h{a) 

then the distribution of X is called collapsing for g and h. 

If these two requirements are met, then the conditional information complexity of / can 
be expressed in terms of the conditional information complexity of h and the parameter n. 

Theorem 2.8 (Bar-Yossef et al. [3]). Suppose that /: (5")*= — > {0,1} is a g-h-direct 
sum and that X G B'' and D are random variables such that the distribution of X is 
collapsing for g and h and D partitions X. Let Y = (Yi,...,Yfc) G {B"')^ and E G 
support(_D)"' be random variables and let Y-^ and denote the projection of Yj and E 
to the jth coordinate, respectively. If the random variables Yj = {{Y^ , . . . ,Y^), E^) for 
j = 1, . . . ,n are independent and Yj ^ (X, D) for all j, then IC£(/; Y|E) > n-lC£{h; X|Z)). 

This direct sum approach can be applied to the A:-party set disjointness problem. 

Observation 2.9. Let AND^ and OR^ denote the Boolean conjunction and disjunction of 
£ bits, respectively. Then the /c-party set disjointness problem is a OR„-ANDfe-direct sum. 

Consequently, for the proof of Theorem 1.2 it is sufficient to prove a lower bound 
on the conditional information complexity of AND^ for a distribution that satisfies the 
requirements of Theorem 2.8 and, in addition, honors the unique intersection promise. A 
distribution with these properties is defined in the following section. This approach was 
already used in [3] and [4]. 

3. The Information Complexity of AND^. 

For the following distribution of D and the input Z = {Zi, . . . , Z^) of AND^ the variable 
D partitions Z and the distribution of Z is collapsing for 0R„ and AND^. Additionally, 
there is at most a single i such that Zi = 1. 

Definition 3.1. From here on let Z = {Zi,...,Zk) G {0,1}'^' and D G {l,...,k} be 
random variables such that the joint distribution of Z and D has the following properties: 
D is uniformly distributed in {1, . . . , k}. For all i G {1, . . . , A;} we have Pi{Zj = 0\D = i} = 1 
for j / i and Pi{Zi = 0\D = i} = PT{Zi = l\D = i} = \. 

Now we can state the main result of this paper, an asymptotically optimal lower bound 
on the information complexity of the ylA^D^-function for inputs that are distributed ac- 
cording to the last definition. 

Theorem 3.2. Let e < ^ 1 ~ \j \ log \ j be a constant. Then there is a constant c{e) > 
that does only depend on e such that IC£(AND/j; Z|Z)) > c{e)/k. 
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It is easy to see that icost(P;Z|D) = 1/k for a trivial deterministic protocol P for 
AND^; where each player in turn writes his input to the blackboard until the first is 
written. Therefore our lower bound is optimal. As we have seen, this result immediately 
implies Theorem 1.2, the other main result of this paper. In the rest of the paper we will 
outline the proof of Theorem 3.2. 

3.1. Some Basic Observations 

We start with some basic observations about the joint distribution of the inputs and 
the transcript of a protocol for AND^ with independent, uniformly distributed inputs. 

Definition 3.3. Prom now on, let P be a fixed randomized /c-player protocol that computes 

ANDfc with error at most e and for x G {0, 1}*^ let M(x) denote the transcript of P for 
the input x. Let X = (Xi, . . . ,Xk) be a random variable that is uniformly distributed in 
{0, l}'^' and let T = M(X) denote the transcript of P for the the input X. 

Note that the transcript M(x) does depend on x and the random inputs of the players. 
Thus even for a fixed input x the transcript is a random variable whose value depends on 
the random bits used in the protocol. 

A randomized /c-party protocol can be seen as a deterministic protocol in which the 
ith player has two inputs: The input to the randomized protocol, in our case X,, and as a 
second input the random bits that are used by the zth player. Then the first observation 
is a restatement of the fact that the set of the inputs (real inputs and random bits) that 
correspond to a fixed transcript is a combinatorial rectangle (see [9] for a definition of 
combinatorial rectangles). 

Observation 3.4 ([3, 4]). Let x = (xi, . . . , Xk) G {0, 1}^ and let t be an element from the 
support of T. Then Pr{X = x|r = t} = Y[^Y'v{Xi = Xi\T = t}. 

We omit the simple combinatorial proof of this observation because this basic prop- 
erty of /c-party protocols was already used in [3] and [4]. The following observation is an 
immediate, but very useful consequence of the previous one. 

Observation 3.5. Let x = (xi, .... xjS) G {0, 1}'^ and let t be an element from the support 
of T. Then Pr{Xi =Xi|T = t, X_, = x_i} = VT{Xi=Xi\T = t} for all i G {1, . . . , A;}. 

Proof. This observation follows immediately from Observation 3.4: By adding the equality 
from Observation 3.4 for (a;i, . . . , Xj-i, 0, Xj+i, . . . , Xk) and (xi, . . . , Xj-i, 1, Xj+i, . . . , x^) we 
obtain 



Pr{X_i=x_i|T = = J]Pr{X,=x,|r 



t). 



Using this and Observation 3.4 verbatim yields 




= VY{X^ = Xi\T = t} . 
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The next observation relates the joint distribution of Zi and M(Z) given that D = i 
to the joint distribution of Xj and T = Af (X) given that X_.j = 0. Combined with the 
previous observations, this wiU be the basis for the proof of the main resuh. 

Observation 3.6. Let ie{l,...,k}. Then l{M{Z):Zi\D = i) = I(T:Xj|X_i = 0). 

Proof. First observe that Pr{Z = v,M(Z) = t\D = i} = Pr{X = v,r = i|X_i = 0} for every 
V G {0,1}*^ and every t in the support of M{X) and M{Z). This follows from the fact 
that the conditional distribution of X given that X_j = is the same as the conditional 
distribution of Z given that D = i, the fact that the random inputs of P are independent of 
X and Z, and the fact that the transcript is a function of the inputs and the random inputs. 
Then the claim of the lemma is an immediate consequence of the initial observation. ■ 

3.2. Main Idea of the Proof 

Like the approach of Bar-Yossef et al. [3] , our approach is based on the observation that 

the distribution of the transcripts of a randomized protocol for AND^ with small error must 
at least be very different for the inputs X = and X = 1. The difference is expressed using 
some appropriate metric on probability spaces. Then, by using Observations 3.4 and 3.5, 
this result is decomposed into results about the distributions of (X, M(X)|X_i = 0) which 
are finally used to bound the conditional mutual information of Z and M(Z) given D by 
using Observation 3.6. The result from [3] mainly uses the Hellinger distance (see [7]) to 
carry out this very rough outline of the proof. We will stick to the rough outline, but our 
result will use the Kullback Leibler distance instead of the Hellinger distance. Due to the 
limited space in the STACS-proceedings we can only present proof-sketches of the technical 
lemmas in this section. A version of this paper with full proofs can be found on the authors 
homepage ^. 

We will first decompose the Kullback Leibler distance of the distributions (T|X = 0) 
and (r|X = 1) into results about the joint distributions of Xi and T for i = 1, . . . , fc. The 
result will be expressed in terms of the following function. 

Definition 3.7. Prom now on, let g{x) = xlog j^. 

Note that the left hand side of the equation in the following lemma is the Kullback 
Leibler distance of (T|X = 0) and (T|X = 1) if S is the set of all possible transcripts. 

Lemma 3.8. Let S be a subset of the set of all possible transcripts. Then 

^Pr{r = t|X = 0}-log p^g^g^°| =2^^Pr{r = ^|X_, = 0}.g(Pr{X, = 0|r = t}) . 

t^S i t^S 

Proof Sketch. The proof of this lemma is mainly based on the fact that 

Pr{T = t|X = 0} _ Pr{X = 0|T = t} 
Pr{T = t|X=l} ~ Pr{X=l|T = t} ' 

Then Observation 3.4 can be applied to decompose the log-function into a sum. Finally, we 
use that Pr{r = t|X = 0} = 2Pr{r = t|X_i = 0} • Pr{Xi = 0|r = t} by Observation 3.5. ■ 



http : //ls2-www . cs . uni-dortmvind . de/~gronemeier/ 
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Next, wc will express a lower bound on I(M(Z) : Z\D) in terms of the following function 

/ and set B{a). 

Definition 3.9. Prom now on, let f{x) = x\og2x + log 2(1 — x). 

Definition 3.10. Let B{a) denotes the set of all transcripts t such that Pr{Xj = 0|T = t} < a 
for all i G {1, . . . , k}. 

The role of the parameter a will become apparent later. The only property that is 
needed for the proof of the following lemma is that a > 1/2. 

Lemma 3.11. Let a > ^ be a constant. Then 

1{M{Z):Z\D)>^Y1 E MT = t\^-i = 0}-fiPT{Xi = 0\T = t}). 

i teB{a) 

Proof Sketch. This lemma can be proved by using that f{x) = ^{fi{x) + f2{x)) where 
fi{x) = X log 2x + (1 — x) log 2(1 — x) and f2ix) = x log 2x. It is sufficient to prove that the 
lower bound holds for /i and /2 instead of /. To this end one can show that 

I(M(Z):Z|Z)) = i^^Pr{T = t|X_i = O}-/i(Pr{X, = 0|r = t}) . 
i t 

Then the bound for /i is obvious since fi{x) is nonnegative for all x € [0, 1]. The bound 
for /2 use the fact that fi{x) = f2{x) + f2i^ ~ x), that f2{x) > for x € [1/2, 1], and that 

^ Pr{r = t |X_i = 0} • /2 (Pr{Xi = l\T = t}) 
t 

is nonnegative. ■ 

The right hand sides of the equation in Lemma 3.8 and the inequality in Lemma 3.11 
look very similar. In fact, if there was a positive constant c such that c • f{x) > g{x) for 
all X G [0, 1], then for a complete proof of Theorem 3.2 it would be sufficient to show that 
the Kullback Leibler distance of (r|X = 0) and (r|X = 1) is bounded from below by a 
constant c(e) if the error of the protocol P is bounded by e. Unfortunately f{x) < 1 for 
X G [0, 1] while g{x) is not bounded from above for x G [0, 1]. So this naive first idea does 
not work. But the function g(x) is bounded in every interval [0,/3] where /3 < 1. The 
following Lemma shows that we can easily bound f{x) from below in terms of g{x) if we 
restrict x to an appropriate interval [0,/?]. 

Lemma 3.12. There is a constant P > ^ such that 4 • f{x) > g{x) for all x G [0,/?]. 

This lemma can probably be proved in many ways. By inspection and numeric compu- 
tations it is easy to verify that it holds for [3 ~ 0.829. Here it is more important to note that 
our choice of the function / is one of the crucial points of our proof: The function g{x) is 
negative for x G [0, ^) and nonnegative and increasing for x G [^, 1]. Furthermore g{^) = 
and in the interval [|, 1] the slope of g[x) is bounded from below by a positive constant. 
It will become clear in Lemma 3.14 that we have to lower bound f{x) in terms of g{x) for 
X \ + 0{^) where k is the number of players. Recall that f{x) = ^(/i(x) + f2{x)) where 
fi{x) = a; log 2a; + (1 — x) log2(l — x) and f2{x) = a; log 2a; and that we prove Lemma 3.11 
by lower bounding the mutual information of M(Z) and Z in terms of fi{x) and f2{x). 
Thus fi{x) and /2(a;) would be natural candidates for the function f{x). Unfortunately, 
neither fi{x) nor /2(a;) alone does work in our proof. The function fi{x) is nonnegative for 
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X € [0, 1], therefore /i(a;) > g{x) for x G [0, g], but the slope of fi{x) is too small in the 
interval [^,1]. It turns out that + ^) ~ If we used the function fi{x) instead 

of f{x) in our proof, we could only obtain an 0,(1/ k^) lower bound for the information 
complexity of AND/j. The function f2{x) does not suffer from this problem since the slope 
of f2{x) in 1] is bounded from below by a constant. But here we have the problem that 
/2(.x) is too small for x € [0, ^). For every constant c > such that c • f{x) > g{x) in the 
interval x € 1] we have g{x) > c • f{x) in the interval x G [0, ^). Luckily, for the average 
f{x) of /i(x) and f2ix) the good properties of the functions are preserved while the bad 
properties "cancel out". The bounded slope for x G [^,1] of f{x) is inherited from f2{x). 
The fact that f{x) is not to small for x € [0, ^) is inherited from /i(x). 

We can use the set B(a) in Lemma 3.11 and the set S in Lemma 3.8 to restrict t to the 
transcripts that satisfy Pr{Xj = 0|r = i} < /3 for alH G {1, . . . , k}. Then, by our previous 
observations, it is easy to lower bound f(Pv{Xi = 0\T = t}) in terms oi g(PT{Xi = 0\T = t}). 

Definition 3.13. Let /3 be the constant from Lemma 3.12. recall that B{a) denotes the 
the set of all transcripts t such that Pr{Xi = 0\T = t} < a for all i G {1, . . . , k}. Then B is 
a shorthand notation for the set B{f3). 

Unfortunately, the restriction of t to the set S = B complicates the proof of a lower 
bound for the left hand sum in Lemma 3.8 since we remove the largest terms from the sum. 
For example, we will see in the proof of Corollary 3.17 that for zero-error protocols the set 
B does only contain transcripts for the output 1. Therefore, by the zero-error property, 
Pr{r G -B|X = 0} = for zero error protocols and the left hand sum in Lemma 3.8 is equal 
to 0. Consequently, without further assumptions that do not hold in general it is impossible 
to prove large lower bounds on the sum in Lemma 3.8 for the set S = B. However, the 
next Lemma shows that we can lower bound the sum, if we assume that PrjT G i?|X = 0} 
is sufficiently large. 

Lemma 3.14. Suppose that Pr{T G B\'K = 0} > | and that the error e of the protocol P 
is bounded by e < jq (^l — ^ \ log | ^ . Then 

^ Pr{T = t|X = 0} , Pr{r = t|X = 0} f, 3 / 10 \^ , 4l ^ 

Proof Sketch. For the proof of this lemma we consider two cases: If Pr{r G B|X = 1} < | 
then we can use the log sum inequality (Lemma 2.1) to lower bound the sum on the left 
hand side. If Pr{r G5|X = l}>i then the error of the protocol P under the condition 
that T e B must be small both for the input X = and the input X = 1. With this 
assumption we can lower bound the left hand side using Lemma 2.2 since in this case the 
total variation distance of (T|X = 0,T G B) and (rlX=l,T G B) is large. ■ 

Note that, by Lemma 3.8 and the fact that the slope of g{x) is bounded from below by a 
positive constant for x G [1/2, 1], this lower bound can be met if Pr{Xj = 0|r = t} = | + 
for all i G {1, . . . ,k} and every t ^ B. 

By Lemma 3.14, under the condition that Pr{T G i?|X = 0} > | our initial naive plan 
of bounding / in terms of g does work. The details of this idea are elaborated on in the 
proof of Theorem 3.16. Next, we look at the case that Pr{r G S|X = 0} is small. It turns 
out that this assumption alone already leads to a large lower bound on I(M(Z) :Z|D). 
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Lemma 3.15. Let a he a constant subject to 1/2 < a < 1. Then 

1{M{Z):Z\D) > ^ ■ Pr{r i S(a)|X = 0} ■ (1 - h^ia)). 

Proof Sketch. The proof of this lemma is based on the fact that, by the definition of B{a), 
under the condition that T = t ^ -B(q;) the entropy of Xi is bounded by h2{a) < 1 for at 
least one i. m 

Now all prerequisites for a full proof of Theorem 3.2 are in place. It is implied by the 
following theorem because P was assumed to be an arbitrary e-error protocol for AND^. 

Theorem 3.16. Let £ < ^ ^ 1 — \ log | ^ he a constant. If the error of the protocol P 
is hounded hy e, then there is a constant c{e) > that does only depend on e such that 

I(M(Z):Z|L») > 

k 

Proof. Recall that B is the set of all transcripts t such that Pi{Xi = Q\T = t} < (5 for all 
i G {!,... where (3 is the constant from Lemma 3.12. For the proof of the lemma we 
will consider two cases. 

For the first case, assume that Pr{T G i?|X = 0} < |. In this case we can apply 
Lemma 3.15 with a = j3 and we get 

I(M(Z) : Z|D) > ^ Pr{T i i?|X = 0}(l - h^m > ^(1 " /i2(/3)) • 

Note that in this case the lower bound docs not depend on e and that, since /3 > 1/2, there 
is a constant ci > such that the right hand side of the last inequality is bounded from 
below hy ci/k. 

For the second case, assume that Pr{T G i?|X = 0} > |. In this case we first apply 
Lemma 3.11 for a = (3, thus B{oi) = B, then Lemma 3.12, and finally Lemma 3.8 for the 
subset S = B to get 

I(M(Z) -.ZID) > 1 ^ ^Pr{r = t|X_i = 0} • f(PT{Xi = 0\T = t}) 



I t€B 

^iEEMr=t|x_,=o}-5(Pr{x,=o|r=i}) 

i teB 

1 IV Pr{T = t|X = 0} 



^Pr{r = i|X = 0} - log 



8k ' ' ^ "Pr{r = i|X = l} • 

Then, by the assumption Pr{r G B|X = 0} > |, we can apply Lemma 3.14 to obtain 

I(M(Z):Zp) > ^•Pr{rGi?|X = 0}-min|log^,2(^l-y£ 

3 f, 3 „ / 10 \^ , 4l 

> — — - • mm < log 77, 2 1 — — e — log - > . 




32A; 



For e < ^ 1 - V 2 



^ log I ^ the minimum in the last inequality is a positive constant that 
does only depend on the constant e. Hence, there is a constant C2(e) > that does only 
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depend on the constant e such that the right hand side is bounded from below by C2{e)/k. 
The claim of the Lemma follows from the two cases if we choose c(£) = minjci, C2(£)}. ■ 

3.3. A Simple Lower Bound for Zero-Error Protocols 

For zero-error protocols a lower bound can be proved by using only Lemma 3.15. 

Corollary 3.17. For every randomized k-player zero- error protocol with input Z and tran- 
script M(Z) the conditional information cost satisfies I(M(Z):Z|D) > l/{2k). 

Proof. Consider the transcript T of the protocol P for the input X. Then the corollary 
follows immediately from Lemma 3.15 if we set a = 1: Recall that the output of the 
protocol can be inferred from the transcript and let P{t) denote the output of the protocol 
P for transcript t. Suppose that P{t) = 0. Then PrjXj = 0|r = t} = 1 for at least one i 
since otherwise, by Observation 3.4, Pr{X = l\T = t} > and under the condition T = t 
the output of P would be wrong with a nonzero probability. Clearly this is not possible 
for zero-error protocols, hence Pr{T ^ B{1)\P(T) = 0} = 1. Under the condition X = 
the output of P is with probability 1, again by the zero-error property, therefore the last 
observation implies that PrjT ^ iJ(l)|X = 0} = 1 and obviously 1 — ^2(1) = 1- ■ 
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